2022-05-07

Introduction

  • Sequencing -> Barracoda pipeline -> our wrangling + visualization
  • Aim: To build a pipeline of data wrangling and visualizations after barracoda pipeline to explore sequence hits

Flow chart or flowchart


Project data

Raw excel-file contains several sheets

Loading data

setwd("/cloud/project")

data <- read_excel("data/_raw/project_data_raw.xlsx")

data
## # A tibble: 118 × 17
##    barcode sample count.1 input.1 input.2 input.3 log_fold_change     p
##    <chr>   <chr>    <dbl>   <dbl>   <dbl>   <dbl>           <dbl> <dbl>
##  1 A18B200 BC372       20     221     128     172          -1.20  0.356
##  2 A19B200 BC372       62     325     203     292          -0.228 0.929
##  3 A20B200 BC372       20     167     132     155          -1.03  0.475
##  4 A21B200 BC372       29     260     170     215          -0.981 0.475
##  5 A22B200 BC372       42     307     185     212          -0.565 0.827
##  6 A23B200 BC372       48     369     247     291          -0.748 0.662
##  7 A24B200 BC372       55     327     196     277          -0.361 0.929
##  8 A25B200 BC372       77     528     348     456          -0.619 0.777
##  9 A18B201 BC372       27     174     116     132          -0.474 0.929
## 10 A19B201 BC372       25     158      90     139          -0.447 0.929
## # … with 108 more rows, and 9 more variables: `-log10(p)` <dbl>,
## #   `masked_p (p = 1 if logFC < 0)` <dbl>, `-log10(masked_p)` <dbl>,
## #   `count.normalised (edgeR)` <dbl>, `input.normalised (edgeR)` <dbl>,
## #   HLA <chr>, Origin <chr>, Peptide <chr>, Sequence <chr>

Loading data - merging sheets

setwd("/cloud/project")

# Accessing all excel sheets 
sheet <- excel_sheets("data/_raw/project_data_raw.xlsx")

# Creating a list of individual data frames for each sheet
data_frame <- lapply(setNames(sheet, sheet), 
                       function(x) read_excel("data/_raw/project_data_raw.xlsx", 
                                              sheet = x))

# Attaching individual data frames together
data_frame <- bind_rows(data_frame, 
                        .id = "Sheet")
data_frame
## # A tibble: 1,770 × 18
##    Sheet barcode sample count.1 input.1 input.2 input.3 log_fold_change     p
##    <chr> <chr>   <chr>    <dbl>   <dbl>   <dbl>   <dbl>           <dbl> <dbl>
##  1 BC372 A18B200 BC372       20     221     128     172          -1.20  0.356
##  2 BC372 A19B200 BC372       62     325     203     292          -0.228 0.929
##  3 BC372 A20B200 BC372       20     167     132     155          -1.03  0.475
##  4 BC372 A21B200 BC372       29     260     170     215          -0.981 0.475
##  5 BC372 A22B200 BC372       42     307     185     212          -0.565 0.827
##  6 BC372 A23B200 BC372       48     369     247     291          -0.748 0.662
##  7 BC372 A24B200 BC372       55     327     196     277          -0.361 0.929
##  8 BC372 A25B200 BC372       77     528     348     456          -0.619 0.777
##  9 BC372 A18B201 BC372       27     174     116     132          -0.474 0.929
## 10 BC372 A19B201 BC372       25     158      90     139          -0.447 0.929
## # … with 1,760 more rows, and 9 more variables: `-log10(p)` <dbl>,
## #   `masked_p (p = 1 if logFC < 0)` <dbl>, `-log10(masked_p)` <dbl>,
## #   `count.normalised (edgeR)` <dbl>, `input.normalised (edgeR)` <dbl>,
## #   HLA <chr>, Origin <chr>, Peptide <chr>, Sequence <chr>

Viral responses in multiple samples?

  • Goal: Visualize whether specific CD8 T cells that recognize the same viral epitope are found in multiple samples.

Simple overview of viral responses

Donor response database

-The database has a special format

Appending new responses on the response database

  • Remove irrelevant data (log_fold_change >2)
  • Match peptides in the database sheet with the the ones from the dataset
  • Assign a database row position to the macthed peptides
  • Create a dataframe with the same amount of rows and add matching peptides on corresponing rows
  • Append it on the database

Adding database row positions to matching data



Adding values in the right rows




Final product






Questions?